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ABSTRACT 

The modeling and analysis generic interface for external numerical codes (MAGIX) is a model optimizer developed under the frame- 
work of the coherent set of astrophysical tools for spectroscopy (CATS) project. The MAGIX package provides a framework of an 
easy interface between existing codes and an iterating engine that attempts to minimize deviations of the model results from available 
observational data, constraining the values of the model parameters and providing corresponding error estimates. Many models (and, 
in principle, not only astrophysical models) can be plugged into MAGIX to explore their parameter space and find the set of param- 
eter values that best fits observational/experimental data. MAGIX complies with the data structures and reduction tools of ALMA 
(Atacama Large Millimeter Array), but can be used with other astronomical and with non-astronomical data. 
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1. Introduction 

Most physical or chemical models use a set of parameters. 
Finding the best description of observational/experimental data 
using a certain model implies determining the parameter set that 
most closely reproduces the data by some criteria, typically the 
minimum of a merit function. Often thex^ distribution^ is used, 
and we will use this term throughout, although it should be un- 
derstood that it could be replaced by other appropriate merit 
functions. Other important results are the goodness of fit, in 
absolute terms, and confidence levels for determined parame- 
ters. This is a generic problem independent of the actual model, 
and instead of implementing an optimizer in each and every 
program, parameter optimization can be separated. Therefore, a 
software package is needed that finds the best parameter set in an 
iterative procedure for arbitrary models by comparing the results 
of the physical model for a given parameter set with the experi- 
mental data set and modifying the parameter set to improve the 
quality of the description, i.e., by reducing the value of In 
general, the physical and chemical models are multidimensional 
non-linear functions of the input parameters. Thus, finding the 
best description for a given experimental data set means finding 
the global minimum of the function, which is a function of 
the model input parameters. 

Many optimization functions will find minima, but they 
could be local minima, which do not describe the results ade- 
quately, and can lead to misleading interpretations. Therefore, 
one has to find the global minimum of the x^ function to obtain 
a good description of the experimental data. However, finding 
the global minimum of an arbitrary function is challenging and 
has been practically impossible for many problems so far. To 



* http : //www. astro .uni-koeln. de/proj ects/schilke/M AGIX 
^ The distribution is a function of relative quadratic diff'erences 
between experimental and model values. 



circumvent this, we need algorithms that allow us to explore the 
landscape of the x^ function and calculate probabilities for the 
occurrence of minima. Combining certain algorithms and mak- 
ing use of the diflTerent advantages of the applied algorithms al- 
lows a reliable but not absolutely unique interpretation of the 
experimental data. Most of the algorithms are very computation- 
ally expensive, and the computational effort tends to scale with 
the degree of reliability. 

These requirements are very general. Hence it makes sense 
to generate a package that is able to read in experimental data, 
communicate with any registered external model program^ and 
compare automatically the result of the physical model with the 
given data through the figure of merit. It should improve the 
quality of the fit within an iterative procedure by adjusting the 
input parameters using several algorithms that fulfill the wide 
range of requirements mentioned above. 

To make MAGIX as flexible as possible, we developed it as 
a stand-alone program instead of a library for a certain program- 
ming language. A library is always coupled to a certain language 
and requires knowledge of their usage. A user without sufliicient 
experience would not benifit from MAGIX while a stand-alone 
program requires only the knowledge of how to start the model 
program. No further experience in software programming is nec- 
essary. Additionally, many model programs such as Radmc-3D 
or Lime are controlled by input files or by partial modification 
of their source code. Therefore, writing one or more files on the 
disk, starting the model program, and finally reading in the re- 
sult is inevitable. In addition, MAGIX offers the possibility to 
use a so-called RAM disk (or RAM drive), which is orders of 
magnitude faster than a normal hard disk. By using an RAM 
disk, the function evaluation / model computation becomes the 

^ Here, the phrase "external model program" means the external pro- 
gram that calculates the model function depending on several input pa- 
rameters. 
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dominant part in the whole process for nearly all external model 
programs. Therefore our approach will likely be more beneficial 
for the majority of users. 

In the following we describe the structure and functionality 
of the MAGIX package that implements this system. We start 
with an overview of the diff'erent parts of MAGIX, followed by 
a detailed description of the so-called registration process that 
couples the extended model to the fitter code. In §4 we explain 
the diff'erent algorithms included in the MAGIX package in more 
detail and end with an astrophysical application of MAGIX to 
absorption lines toward SgrB2(M) from the HEXOS GT KP 
(Schilkc etal 2010) ^ 

2. MAGIX 

MAGIX provides a generic toolbox for modeling data through 
external model programs. This requires registering the model, 
i.e., telling MAGIX about the input and output formats, and how 
to run the model. This has to be done only once for each model. 
Then, for each run, the model has to be instantiated, i.e., given 
the initial values and parameter boundaries. After an iterative 
procedure MAGIX provides the best-fit parameters, within the 
framework of the model, to a particular data set, including confi- 
dence intervals for the parameters. Internally, the parameter and 
model sets are stored in XML files, but this is transparent to the 
user. MAGIX consists of a frontend to create and manipulate the 
necessary XML files, modules for reading experimental data, the 
fitting engine, and an output module. 

All required XML files can be created from a GUI, which 
aspires to be self-explicable. The various XML files are: 

- The registration file, which contains a description of the 
structure of the input and output file(s) of the external model 
program (see §3). 

- The so-called instance file, which includes the names, initial 
values (and ranges) for all model parameters for a particu- 
lar fit. It works in conjunction with the registration file and 
indicates the model parameters that should be optimized by 
MAGIX and those that are to be kept fixed. 

- The XML file containing settings for the import/export of 
experimental data, i.e., path(s) and name(s) of the data file(s), 
format(s), range(s), etc. 

- The so-called algorithm control file, that defines which al- 
gorithm or algorithmic sequence MAGIX should use for the 
optimization together with the corresponding settings for the 
diff'erent algorithms (see §4). 

- The I/O control file, including the paths and file names of the 
aforementioned XML files. 

The final output of MAGIX includes the best-fit model, the best- 
fit parameters, and an estimate of the goodness of fit. Depending 
on the method chosen, it provides an exploration of the parame- 
ter space, i.e., information about the existence of other minima, 
and, optionally, the confidence intervals. 

Depending on the model and the chosen algorithm, the com- 
putational load is heavy (sometimes hundreds or thousands of 
function calls are necessary). OpenMP^ parallelization for simul- 
taneous evaluations of the external program is available for most 
of the algorithms (except for interval-nested sampling) if the ex- 
ternal model program fulfills certain requirements described in 
§3. 

^ An earlier version of this code was written by (Boone et al 2006). 
http://openmp.org/wp/ 



The OpenMP parallelization is used whenever the values 
for a set of independent so-called parameter vectors have to be 
calculated. (A parameter vector contains a value for each free 
parameter and describes a point within the parameter space). 
Within such a parallelized loop all subroutines used to calcu- 
lated the value of a certain parameter vector such as writ- 
ing the input file(s), reading in the output file(s) and calculat- 
ing the corresponding;^^ value are executed simultaneously. The 
system overhead is negligible for external model programs that 
require more than a few milliseconds for one function evalua- 
tion. Furthermore, the usage of an RAM disk as described above 
could reduce the system overhead as well. 

Additionally, MAGIX creates three types of log-files for 
each algorithm call: The first log-file contains the lowest x^ 
value with the corresponding values of the free parameters for 
each iteration step. The second log-file contains the correspond- 
ing input file(s) for the external model program for each iteration 
step. The comprehensive third log-file contains the;^^ values and 
their corresponding free parameter values of all function calls. 

A detailed description of all parts of MAGIX can be found 
in the MAGIX manual. 

MAGIX framework is written in Python, while various algo- 
rithm packages are written in Fortran for performance reasons. 
The following packages are required: python 2.6, NumPy L3, 
pyfits LO, gfortran 4.3, matplotlib 0.99, and scipy 0.9. The 
MAGIX package is a command-line-based program; therefore 
it can be called by other programs, e.g. CASA^. 

3. Registration 

In contrast to other astrophysical optimization packages MAGIX 
does not include any intrinsic model program that calculates the 
model function depending on a given parameter set. To commu- 
nicate with the external model program, MAGIX has to create 
the required input file(s) for every call of the external model pro- 
gram, including the modified parameter values and a directive 
how to read in the result of the model program. During every 
optimization step the values of the parameters to be optimized 
will have been modified. Consequently, MAGIX has to produce 
the actual input files that contain the new parameter values for 
the model to run at each subsequent function call. Therefore 
MAGIX has to be given directives how to create/write the in- 
put files that will be used in the function call. Hence the user 
has to define the structure of the input file(s), including loops 
with the information which parameter has to be written to which 
location. After each optimization step, MAGIX compares the ex- 
perimental data with the values of the model function calculated 
by the external program with the latest values of the parame- 
ters. The diff'erence between data and model is quantified by the 
value of;^^, i.e., low values of;^^ correspond to small diff'erences. 
Therefore, the path, the file name(s), and the format of the out- 
put file(s) of the model program have to be defined as well. The 
descriptions of both the input and the output files of the model 
are given in the XML file that we call registration file. 

Additionally, the registration file indicates whether the ex- 
ternal model program can be used in a parallelized MAGIX run 
or not, i.e., if it is possible to execute two or more instances of 
the same external model program on the same machine at the 
same time. This depends on how and where output or interme- 
diate files of the model are written, since diff'erent instances of 
models running in parallel must not overwrite each other's files. 
This is mainly a bookkeeping problem. 

^ http://casa.nrao.edu/ 
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Fig.l. Analytical test functions: a) The Himmelblau func- 
tion f(xi,X2) = (x^ + X2 - 11)^ + (xi + x\ - 7)^ has four 
identical minima at /(3.0,2.0) = /(-2.805118, 3.131312) = 
/(-3.779310, -3.283186) = /(3.584428, -1.848126) = 0. 
b) The Rosenbrock function /(xi , X2) = (1 - xi)^ + 100 (x2 - x^)^ 
has one global minimum at /(1, 1) = 0. c) The Rastrigin function 
f{xi , X2) = 10 + {x\ - 10 cos(27r • jci)) + 10 + (jc^ - 10 cos(27r • JC2)) 
has one global minimum at /(0, 0) = 0. The contour plots of the 
different test functions are plotted in the second column. The po- 
sitions of the global minima for each test function are indicated 
by red dots. 

Ideally, a model has to be registered only once, i.e., it is 
not necessary to register an already registered model again as 
long as the structure of the input and output file(s) is unchanged. 
Whenever one wants to optimize some parameter(s) of the model 
with MAGIX, it should be sufficient to edit the instance XML 
file. 

MAGIX comes with a suite of pre-registered models^, and 
the authors are willing to assist with the registration of new mod- 
els. 

4. Optimization algorithms 

MAGIX provides optimization through one of the following al- 
gorithms or via a combination of several of them (algorithm 
chain, see §4.10): the Levenberg-Marquardt (conjugate gradi- 
ent) method (§4.1), which is fast, but can get stuck in local min- 
ima, simulated annealing (§4.2) and particle swarm optimization 
methods (§4.3), which are slower, but more robust against local 
minima. Other, more modem methods, such as bees (§4.4), ge- 
netic (§4.5), nested sampling (§4.6), or interval nested sampling 
algorithms (§4.7) are included as well for exploring the solu- 
tion landscape, checking for the existence of multiple solutions, 
and giving confidence ranges. Additionally, MAGIX provides 

^ At the moment the following software packages are registered: 
SimLine, Lime, Radmc-3D, myXCLASS, RADEX 




Fig. 2. Results for the Rosenbrock function using the LM algo- 
rithm with start values xi = 4.5, X2 = 1.0: The distribution of 
the parameter values after five function calls (x^ = 1.38 • 10"^) is 
indicated by green-yellow points. The sequence of iterations is 
color-coded (color bar on the lower right). Early points are de- 
noted in dark green, points toward the end of the iteration are 
light yellow. The red dot denotes the global minimum of the 
Rosenbrock function. 



an interface to make several algorithms included in the scipy ^ 
package available. 

In the following, we give a short description of the op- 
timization algorithms that are implemented in MAGIX using 
the analytic Himmelblau-, Rosenbrock-, and Rastrigin func- 
tions (Eig. 1) as test functions for demonstration, with multi- 
ple minima (Himmelblau, Rastrigin) and a very shallow min- 
imum (Rosenbrock). The optimization problem is solved di- 
rectly through these algorithms via stochastic searching without 
derivatives or gradient information (except for the Levenberg- 
Marquardt algorithm). 

4.1. Levenberg-Marquardt algorithm 

The Levenberg-Marquardt algorithm (LM), (Marquardt 1963, 
Nocedal & Wright 2006, Press et al 2007)^ is a hybrid be- 
tween the Gauss-Newton algorithm and the method of gradi- 
ent descent. The Gauss-Newton algorithm is a method for solv- 
ing non-linear least- squares problems. It is a modification of 
Newton's method to find the minimum of a function, but is con- 
strained so that it can only minimize a sum of square function 
values. It requires knowledge of the gradients in;^^ space, which 
can be obtained from diff'erential steps for sufficiently smooth 
functions. The LM can find a minimum (possibly local) even if 
it starts very far from it, but the efficiency depends on the land- 
scape of the parameters. On the other hand, for functions and 
starting values of parameters that are very close to the final min- 
imum, the LM tends to be slower than Gauss-Newton. The LM 



^ http://www.scipy.org/ 

^ While we often refer to Numerical Recipes (NR) to explain the al- 
gorithms, no actual NR algorithms are included in the package. 
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Fig. 3. Results for the Rastrigin function using the S A algorithm 
with start values xi = -1.0, X2 = -1.0: The distribution of the 
parameter values after 155 function calls (x^ = 4.37 • 10"^) is 
indicated by green-yellow points. The sequence of iterations is 
color-coded (color bar on the lower right). Early points are de- 
noted in dark green, points toward the end of the iteration are 
light yellow. The red dot denotes the global minimum of the 
Rastrigin function. 



is an algorithm that strongly depends on the starting values of the 
parameters that are to be optimized, and the user should choose 
the starting values very carefully. Otherwise the algorithm can 
easily become stuck in a side minimum of the global solution. 

MAGIX contains a modified version of the MINPACK pack- 
age implementation (Garbow et al. 1980). The gradient of the 
function is calculated in a parallel environment using OpenMP. 
Furthermore, the user can define the variation var used for the 
gradient determination. Because MAGIX cannot determine the 
gradient analytically, MAGIX has to use a numerical approxi- 
mation: (d/dxi)f(x) = (f(xi -\- h) - f(xi))/h, where the variation 
h is defined hy h = Xi • var. Varying the size of the variation may 
be necessary if the function is not a smooth function and the 
calculation of the gradient produces awkward results. As shown 
in Fig. 2, the fast convergence of the LM is obvious, where the 
minimum is found after five (!) iterations. 

4.2. Simulated annealing 

Simulated annealing (SA), (Press et al. 2007) is a generic prob- 
abilistic computational method that is used for the problem of 
global optimization, i.e., to find a good approximation to the 
global optimum of a given function in a large parameter space. 
For certain problems, SA is more eff'ective than exhaustive enu- 
meration - provided that the goal is merely to find an acceptably 
good solution in a fixed amount of time, rather than the best pos- 
sible solution. 

In comparison to the LM, SA is more robust. Its result does 
not depend so much on the neighborhood of the starting point. 
The LM searches for the highest (negative) gradient and stops 
when it detects a local minimum. In contrast, SA can check if 



the gradient around a local minimum is flat (low perturbation), in 
which case it will continue to find a better minimum, i.e., a lower 
value than the one found before. In Fig. 3, the SA algorithm 
is able to find the global minimum of the Rastrigin function at 
xi = and X2 = although it starts in a local minimum at 
xi = -1 and X2 = -1. 

The name and inspiration of this algorithm come from the 
process of annealing in metallurgy. This technique involves heat- 
ing and controlled cooling of a material, with the aim to gradu- 
ally increase the size of its crystals and reduce their defects. The 
heat provides energy and causes the atoms to move from their 
initial position (which was a local minimum of internal energy) 
and wander randomly through states of higher energy. Then, a 
slow cooling gives them more chances of finding configurations 
of lower internal energy than the initial one. 

By analogy to the physical process, each step of SA replaces 
the current solution by a random nearby solution. This nearby 
solution is chosen with a probability that depends both on the 
diff'erence between the corresponding function values and on a 
global temperature, T. The temperature T is gradually decreased 
(multiplied by the temperature reduction coefficient, ^ < 1) dur- 
ing the process. 

The dependency of the temperature diff'erence between two 
subsequent steps is such that the solution changes almost ran- 
domly when T is high, but it is modified increasingly toward 
lower values as T becomes zero. Allowing for increasing val- 
ues prevents the method from becoming stuck at local minima 
- which can happen with gradient methods such as the LM. 
Subsequent points of the SA algorithm follow a perpendicular 
direction. 

MAGIX uses a partially parallelized implementation of the 
scipy algorithm using OpenMP. 

4.3. Particle swarm optimization 

The particle swarm optimization (PSO) algorithm implemented 
in MAGIX is a hybrid (Fan & Zahara 2007) between a particle 
swarm optimization algorithm (Kennedy & Eberhart 1995) and 
a Nelder-Mead simplex search method (Nelder & Mead 1965). 
The PSO optimizes a problem by iteratively trying to improve 
a candidate solution according to some measure of quality; the 
particles are sent toward a better solution flying so much faster 
according to the technique's performance in previous step. The 
Nelder-Mead technique or downhill simplex search method is 
a traditional technique for the direct search of function minima; 
it is easy to use, does not need calculation of derivatives and 
therefore can converge even to non- stationary solutions. 

Both techniques have been modified by Fan and Zahara 
(Fan & Zahara 2007) and their combination is the hybrid algo- 
rithm that is available in MAGIX. The PSO algorithm performs 
a kind of heuristics: It starts from an initial random population of 
particles and searches in the neighborhood for a global optimum. 
The particles with the best-fit function values are updated with 
the simplex method, while the particles with the poorest function 
values are updated with the PSO. 

The whole procedure prevents the algorithm from being 
trapped locally and at the same time allows it to find the global 
minimum. It repeats itself until a termination criterion or the 
maximum number of iterations is reached. The iteration shown 
in Fig. 4 stops after 819 function calls because the value of 
dropped below the limit of = 9 • 10"^. The corresponding 
parameter values are xi = 0.9954, and X2 = 0.9907 instead 
of xi = 1 and X2 = I. Reducing the limit of the x^ value 
would lead to a better description of the global minimum but it 



4 



T.MoUer^^ a/.: MAGIX 




Fig. 4. Results for the Rosenbrock function using the PSO algo- 
rithm: The distribution of the parameter values after 819 func- 
tion calls (x^ = 231 ' 10"^) is indicated by green-yellow points. 
The sequence of iterations is color-coded (color bar on the lower 
right). Early points are denoted in dark green, points toward the 
end of the iteration are light yellow. The red dot denotes the 
global minimum of the Rosenbrock function. (The clustering of 
the function calls right above the global minimum is caused by 
the very flat gradient of the Rosenbrock function in this area.) 



would require more function calls. The algorithm implemented 
in MAGIX is parallelized using OpenMP. 



4.4. Bees algorithm 

The bees algorithm (Pham et al. 2005) is a swarm algorithm that 
performs a kind of neighborhood search combined with a ran- 
dom search (see, Fig. 5). This name was given to the algorithm 
because it tries to mimic the collection of nutriments by honey 
bees, in that there is always a part of the population that per- 
forms the role of scouts, traveling far away in random directions 
to detect new nutrition sources. 

The algorithm starts with an initial set of parameter vectors 
(a collection of particles or scout bees, i.e., the hive of bees), 
randomly selected and such that it spreads throughout the en- 
tire parameter space. After the fitness of each bee is evaluated 
in terms of the quality ranking it just visited, the bees with the 
highest fitness visit the neighborhoods of the sites they currently 
are at. Of the remaining bees some are sent away to random sites 
and some are sent to search in the neighborhood of the very best 
sites as well (so that there are more bees searching for food in 
places where it is more probable to find nutrition sources). At the 
end of the step, the fitness of each visited site is evaluated, and 
the bees who just visited them move away or search in the closer 
vicinity of the best sites. The result is that the bees algorithm 
finds areas of local minima, see Fig. 5. 



Fig. 5. Results for the Himmelblau function using the bees algo- 
rithm: The distribution of the parameter values after 1477 func- 
tion calls is indicated by green-yellow points. The sequence of 
iterations is color-coded (color bar on the lower right). Early 
points are denoted in dark green, points toward the end of the 
iteration are light yellow. The red dots denote the four global 
minima of the Himmelblau function. 



4.5. Genetic algorithm 

The genetic algorithm (GA) is a probabilistic search algorithm 
that mimics the process of natural evolution. It iteratively trans- 
forms a set of parameter vectors (population), each with an as- 
sociated fitness value, into a new population of objects. 

The procedure takes place using the Darwinian principle of 
natural selection, with operations that are patterned after nat- 
urally occurring genetic operations such as recombination and 
mutation. Recombination is the joined process of reproduction 
and crossover, i.e., mixing the genetic matter (parameter values) 
of the parents (parameter vector) (Whitley 1994). Mutation is 
the complete disappearance of specific genetic material and its 
transformation to a completely diff'erent material; this happens 
especially in combinations of genetic material (parameter sets) 
that does not fit. 

The evolution starts from a randomly selected population 
(of parameter sets) and proceeds in evolutionary generations 
(stages/iteration steps). When a generation step is completed, 
the fitness of all members of the population is evaluated. Then a 
number of parameter sets is stochastically selected for modifica- 
tion; the fittest members are more likely to be kept unmodified. 
The rest of the population members are modified; the modifi- 
cations they go through are more intense because they are fit. 
The modified and unmodified parameter sets make up the new 
population whose fitness will be evaluated at the end of the step 
(Herrera et al 1998, Herrera et al 2005) (see. Fig. 6). 

4.6. Nested sampling 

The nested sampling (NS) (Skilling 2006, 

Feroz & Hobson 2008) algorithm included in MAGIX is a 
combination of a Monte Carlo method and Bayesian statis- 
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parameter x 



Fig. 6. Results for the Rosenbrock function using the GA: The 
distribution of the parameter values after 1112 function calls 
ix^ = 9.33 • 10"^) is indicated by green-yellow points, where 
the dark green points indicate function calls that are made at 
the beginning of the fit process and light yellow points represent 
function calls at the end of the fit process. The red dot denotes 
the global minimum of the Rosenbrock function. 



tics (Sivia & Skilling 2006). The Monte Carlo methods are 
a family of computational algorithms that perform repeated 
random sampling and compute the results for every sample. 
The Bayesian statistics is a statistical inference technique, i.e., 
it attempts to draw conclusions from data subject to random 
variation. In particular, the Bayesian inference calculates the 
probability that a hypothesis is true, i.e., how probable it is that 
the newly calculated value of a parameter is closer to the real 
one (the value that best fits experimental data); if it is more 
probable than the previously calculated, then the parameter 
value is updated. 

The principle of the NS algorithm is illustrated in Fig. 7: The 
Bayesian approach is used to obtain a set of physical parameters 
= (0i',62, . . . ,0n) that attempts to describe the experimental 
data. The Bayesian analysis is assumed to incorporate the prior 
knowledge with a given set of current observations to make sta- 
tistical inferences. The prior information 7r(0) can come from 
observational data or from previous experiments. Bayes theo- 
rem states that the posterior probability distribution of the model 
parameters is given by 



Pr(0) 



i;(0)7r(0) 



(1) 



where Pr(0) is the posterior probability distribution of the 
model parameters, is the likelihood of the data for the 

given model and its parameters, ;7r(0) is a prior information, and 
Z is Bayesian evidence. The Bayesian evidence is the average 
likelihood of the model in parameter space. It is given by the 
following integral over the ^-dimensional space: 




Xi 1 



Fig. 7. Principle of the NS algorithm (taken from 
(Skilling and MacKay 2012)): The upper part of both pan- 
els describes a contour plot of a likelihood function £(6). Left 
panel a): Each point 6 within parameter space p defined by the 
parameter ranges of Oi and 62 is associated with the volume that 
would be enclosed by the contour L = £(6). (L(x) is the contour 
value such that the volume enclosed is x). If the points 6 are 
uniformly distributed under the prior probability distribution 
(prior), all these volumes (x- values) are uniformly distributed 
between and 1. Right panel b): Using a Markov chain method, 
the NS algorithm takes a point (purple dot) from p satisfying 
L > L(xi). Inserting the new point into this distribution, we can 
find the highest x- value X2 used for the next iteration. 

The NS algorithm transforms the integral (2) to the single 
dimension by re-parametrization to a new linear variable^ - a 
prior volume X. The volume of parameter space can be divided 
into elements dX = 7r(0)J0. The prior volume X can be accu- 
mulated from its elements dX in any order, so we construct it as 
a function of decreasing likelihood: 



X(A)= [ 71(e) de. 
j£m>A 



(3) 



That means that the cumulative prior volume covers all like- 
lihood values exceeding A. As A increases, the enclosed volume 
X decreases from X(0) = 1 to X(l) = 0. If the prior information 
;r(0) is uniformly distributed in the parameter space, equation 
(2) for the evidence transforms into 



Jo 



£(X)dX. 



(4) 



One can calculate the partial likelihood as L/ = Jl(Xi), where 
the Xi is a sequence of decreasing values, such that 

< ... <X2 <Xi < 1. (5) 

MAGIX uses the trapezoid rule to approximate the evidence 

m V V 

Z„ where Z, = . 



(6) 



£(e)n(e)de. 



(2) 



^ Although we do not know the values of these volumes X, we know 
the order of them because L(xi) = £(6) 
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Fig. 8. Results for the Himmelblau function using the NS algo- 
rithm: The distribution of the parameter values after 2599 func- 
tion calls is indicated by green-yellow points, where the dark 
green points indicate function calls which are made at the begin- 
ning of the fit process and light yellow points represent function 
calls at the end of the fit process. The red dots denote the four 
minima of the Himmelblau function. 



The NS algorithm is targeted at calculating Bayesian evi- 
dence, but it assists in obtaining a posterior sample of points 
from which one can estimate uncertainties of parameter values. 
The prior volume X/, which corresponds to the likelihood con- 
tour L/, is usually evaluated with a random number generator. 

The actual algorithm is based on the Markov chain Monte 
Carlo (MCMC) methods (Press et al 2007, Gilks et al 1996, 
Diaconis 2009). Those are a family of algorithms that sam- 
ple from probability distributions, with the aim to construct a 
(Markov) chain with the desired distribution (a distribution with 
the desired properties) as the equilibrium distribution. The qual- 
ity (how well the parameter sets fit the data) of the sample is a 
monotonically increasing function of the number of steps (see. 
Fig. 7). 

An MCMC algorithm is used to reduce the dimensionality 
of the parameter space through integration. The Bayesian ap- 
proach to this technique allows one not only to find multiple 
solutions, but also to define proportional weights of parameter 
values and to evaluate the Bayesian evidence^^. The NS algo- 
rithm included in MAGIX requires fewer samples than standard 
MCMC methods (Feroz & Hobson 2008) while also providing 
posterior probabilities for each one of the best parameter vec- 
tors. The NS algorithm has the following advantages: It is a non- 
derivative method and investigates the landscape of optimization 
function. Additionally, it can find multiple minima. On the other 
hand, the algorithm does not converge to a global minimum and 



Bayesian interpretation of probability: As the number of steps in- 
creases, we collect evidence with regard to the consistency or incon- 
sistency of that evidence with a given hypothesis. More specifically, as 
the evidence accumulates, we tend to believe in the given hypothesis 
more or less, depending on the degree to which the increasing evidence 
agrees with that hypothesis. 



depends strongly on the random number generator. The results 
for the Himmelblau function, shown in Fig. 8, are similar to the 
results of the bees algorithm. Both algorithms can be used to ex- 
plore the landscape of the;^^ distribution and for finding areas of 
global minima. The implementation of the algorithm included in 
MAGIX is paralleHzed using OpenMP. 

4. 7. Interval nested sampling algorithm 

The Interval Nested Sampling (INS) algorithm (developed by 
I. Bernst^^) included in MAGIX is an implementation of the 
branch-and-bound algorithm (Ichida & Fujii 1979) to find the 
next prior volume Xf. The algorithm is based on the NS algo- 
rithm (§4.6) and uses an interval method for the definition of 
a next part of the prior volume. The main principle of the in- 
terval method is a division of the parameter space into interval 
boxes and an estimation of the optimization function value over 
the boxes. The estimate is called inclusion function and can be 
calculated by various methods. The centered form of inclusion 
with slopes is used in the current version of the INS algorithm 
(Krawczyk 1985). The interval method assists in finding the next 
prior volume X/ by determining the ratio of the volume of the 
working interval box Z/ to the whole volume Z of the parame- 
ter space. Posterior weights of points after the NS process are 
calculated using the Bayes theorem (see. Fig. 9): 



Z' 



(7) 



Using the sequence of posterior samples of parameter vec- 
tors we are able to determine the reliability of model parameters 
such as standard deviations or to construct posterior distributions 
of parameter values. The mean value /d(6j) of each model param- 
eter 6j, j = (I, . . . ,n) and its standard deviation cr(Oj) are given 
as 



(8) 



and 



criOj) 



(9) 



where k indicates the number of points in the sample. The 
INS algorithm is capable of handling multi-modality of the op- 
timization function, phase transitions, and strong correlations 
between model parameters. As shown in Fig. 9, the Interval 
Nested Sampling algorithm requires fewer function calls than 
other global optimizers. Additionally, it is used to determine 
the confidence intervals of parameters, which is described in the 
next section. 

4.8. Error estimation 

MAGIX provides an error estimation for each single parameter 
at the point of minimum. A schematic diagram of the error es- 
timation module is shown in Fig. 10: After some optimization 
procedure MAGIX determines a point of minimum. The input 
values for the error estimation are given by the point of minimum 
and the parameter space. To determine the error of a parameter Oj 



A paper describing the INS algorithm in detail is in preparation. 
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Fig. 9. Results for the Himmelblau function using the INS algorithm: Left panel: The distribution of the parameter values after 251 
function calls is indicated by green-yellow points, where the dark green points indicate function calls that are made at the beginning 
of the fit process and light yellow points represent function calls at the end of the fit process. The red dots denote the four minima 
of the Himmelblau function. Right panel: Posterior weights of points after the NS process. 



at the minimum, MAGIX varies this parameter within the given 
parameter range, while the other parameters are kept constant. 
The INS algorithm is applied and returns a set of parameter val- 
ues with proportional weights and the logarithm of the Bayesian 
evidence for parameter Oj for a sequence of parameter values 
distributed over the whole parameter space. If the distribution 
of parameter values has only one minimum, Eqn. (8) - (9) can 
be applied to calculate the mean value and the standard devia- 
tion. Sometimes there are several minima in the sequence, hence 
these formulas cannot be used directly because the resulting esti- 
mation of the mean value and standard deviation produces mean- 
ingless results (see Figs. 9 and 11). 

We are interested in the uncertainty around the considered 
minimum. Therefore, we need to estimate the mean value and 



the standard deviation in the minimum. This is done using a clus- 
tering method: 

- Calculate the distances from the minimum to all points of the 
sample. 

- Sort the points depending on their distances (ascending or- 
der). 

- Select the points with a function value lower than the 

boundaries for the 99 percent confidence region A;^^^ 
(a = 0.99). 



First optimization 
algorithm: 



LM, SA, GA, PSO, Bees 




Input: 

Point of minimum 
^Parameter space^ 



Input: 
Number of free 

parameters 
Parameter space 



Output: 



Point of minimum 



^ Output: \ 

For each parameter: 

Error_left 
Error_right 
IVIean value 
Standard deviation 
Logarithm of evidence 



One parameter is variable. 
Other parameters 
are kept constant at 
the point of minimum 

interval Nested 
Sampling algorithm 



Set of parameter values 
with proportional weights. 
Logarithm of the evidence 



Clustering method 



For variable parameter: 
Mean value 
Standard deviation 



Fig. 10. Schematic diagram of error estimation module of 
MAGIX. 
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Fig. 11. Distribution of parameter values with several minima 
where a direct application of Eqn. (8) - (9) is not possible. (Here, 
/u(6j) indicates the mean value fi(Oj), criOj) is the standard devia- 
tion (T(6j), and "min" represents the value of the parameter Oj of 
the best-fit result). 
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Fig. 12. Schematic diagram of error estimation module of 
MAGIX. (Here, /u(6j) indicates the mean value /d(Oj), cr(Oj) rep- 
resents the standard deviation cr(6j), and "min" the value of the 
parameter Oj of the best-fit result). 

Finally, the new sample of m points (m < k) is distributed around 
the minimum point (Fig. 11, gray box) and the mean value of the 
parameter Oj is calculated as follows: 



(10) 



In Bayesian statistics one usually applies a credible interval 
(or Bayesian confidence interval) to the parameter value. For a 
single parameter and a sample of points that can be summarized 
in a single sufficient statistic, it can be shown (Jaynes 1976) 
that the credible interval and the confidence interval coincide if 
the unknown parameter is a location parameter (i.e., the forward 
probability function has the form Pr{6i\ii{6i)) = f(6 - jd) ) with 
a uniform prior distribution. Hence credible intervals in our case 
are analogous to confidence intervals in frequentist statistics. 
Using the mean value /d(6j) and the standard deviation cr(Oj), 
the 3(T confidence interval of the parameter Oj is given by 



Pr(M(Oj) - cT(Oj) < Oj < jj(Oj) + cT(Oj)) ^ 0.99. 



(11) 



In Fig. 12 the example of a histogram of distribution of pa- 
rameter values after error estimation is shown. Hence for the pa- 
rameter value at the point of minimum (using the error left and 
the error right). 



parameters, and on the ranges for each free parameter. Every al- 
gorithm has a diff'erent strategy for finding the minimum of the 

function. Whether the strategy is successful depends on the 
function and many other conditions. For example, the bees al- 
gorithm requires a huge computational eff'ort, but this algorithm 
explores the whole landscape of thex^ function within the given 
ranges. It gives a good overview of the landscape of the prob- 
lem and can in principle find even minima in narrow valleys. 
On the other hand the computational eff'ort is heavy, and the al- 
gorithm is inefficient if computing the external model requires 
more than a few seconds. Here, the computation time of an algo- 
rithm depends on the number of function calls for each iteration, 
the computation time of the external model program, the possi- 
bility of executing the external model program more than once 
on the same machine at the same time and on the size of the 
parameter space. Other swarm algorithms such as the particle 
swarm or NS algorithms might be better if the x'^ function has a 
smoother shape. For example, the particle swarm algorithm in- 
cluded in MAGIX makes use of a Nelder-Mead simplex search 
method, which accelerates the whole computation if the;^^ func- 
tion has no narrow valleys, i.e, thex^ function does not change 
drastically when one or more parameters are varied. 

In Table 1 the total cost (in function evaluations) for each 
test function (Rastrigin, Rosenbrock, and Himmelblau function) 
for each global optimization algorithm is shown. As mentioned 
above, the total cost of an algorithm, i.e., the number of function 
evaluations depends strongly on many parameters. Therefore, 
Table 1 can give only a very rough overview of the efficiency 
for each algorithm. For example, the INS algorithm is quite effi- 
cient in finding the global minimum of the Rastrigin function but 
less efficient in finding the global minimum of the Rosenbrock 
function. 

In general, the global optimization algorithms are more ef- 
ficient in finding the areas of global minima. But they are less 
efficient in finding the exact properties of the global minima. 
For that purpose, local optimizers, such as the LM or SA are 
much more efficient. The combination of diff'erent algorithms is 
the best strategy to find the complete description of the experi- 
mental data. 



4.10. Algorithm chain 

Therefore, MAGIX includes the possibility to send the results 
of the optimization process performed by one algorithm to an- 
other optimization loop through some diff'erent algorithm. As 
mentioned above, the SA as well as the LM algorithm require 
starting values of the parameters that are optimized, i.e., the user 



Pr(6i(mm) - errorMt < Oj < ^/(min) -h errornght) ~ 0.99. (12) 

The Bayesian evidence usually plays an important role in 
model selection but in parameter estimation the evidence fac- 
tor is ignored because it is an integrated value over the whole 
parameter space. The INS algorithm calculates the logarithm of 
the evidence and can be used to estimate the quality of the fit- 
ting procedure. A high absolute value of the evidence logarithm 
indicates a big uncertainty of the parameters. 



4.9. Which algorithm should be used? 

In principle, it is impossible to answer the question which al- 
gorithm is best. All algorithms included in MAGIX try to find 
the global minimum of the x^ function, which depends on the 
observational data, on the external model, on the free and fixed 



Table 1. Total cost (in function evaluations) for each test func- 
tion for each global optimization algorithm. 



algorithm 


Rastrigin 


Rosenbrock 


Himmelblau 


name 


function 


function 


function 




/^limit ~~ 1 


A limit ^ 


y2 ^ 5 . 10-4 
A limit 


Bees 


1220 


14491 


101664 


PSO 


1317 


535 


770 


Genetic 


241 


533 


1626 


NS 


4230 


5080 


8720 


INS 


20 


1144 


168 



Notes. The algorithms stop if the value of dropped below the given 
limit of;^^. We neglect here the so-called local optimizer such as LM 
and SA algorithm because the efficiency of these algorithms depends 
strongly on the starting values. 
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Fig. 13. Results for the Rosenbrock function using an algorithm chain: Left panel: The distribution of the parameter values for the 
bees algorithm (green-yellow points, where the dark green points indicate function calls that are made at the beginning of the fit 
process and light yellow points represent function calls at the end of the fit process) and for the following simulated annealing 
algorithm (white points). The red dot indicates the global minimum of the Rosenbrock function. Right panel: Posterior weights of 
points after the NS process used by the error estimation module. 



has to find a good fit by hand before applying these algorithms 
produces useful results. Often, the location of the minimum can 
be guessed with sufficient accuracy to give good starting val- 
ues, but sometimes one is completely in the dark. Using an algo- 
rithm chain, the user can first apply one of the swarm algorithms, 
e.g., the bees or NS algorithm, to determine the starting values 
for the subsequent local optimization algorithm using SA or the 
LM algorithm. As shown in Fig. 13, we used an algorithm chain 
to determine the global minimum of the Rosenbrock function 
where we first applied the bees algorithm to explore the land- 
scape of the problem. Using the best result of the bees algorithm 
(the parameter set that corresponds to the lowest value, here 
jci = 1.7044 and X2 = 2.8679) as starting point for the SA algo- 
rithm, we find the global minimum of the Rosenbrock function. 
MAGIX does not only allow one to use the best but also the sec- 
ond best etc. result of a swarm algorithm as starting values for 
other algorithms. Therefore, we are able to find multiple minima 
of models that behave as the Himmelblau function. 

To determine the confidence intervals for the parameters that 
are optimized, the user has to set the error estimation algorithm 
as the last algorithm of the algorithm chain. The confidence in- 
tervals for the parameters that are used in the example shown in 



Table 2. Best parameter set with the corresponding confidence 
intervals (x^ = 2.5576 • 10"^) after applying an algorithm chain. 



parameter 


value at 


error 


error 


log 


name 


minimum 


left 


right 


(evidence) 




0.9954 


3.43 • 10"^ 


2.85 • 10-2 


-3.309 


X2 


0.9905 


7.38 • 10-2 


8.21 . 10-2 


-2.846 



Notes. We used an algorithm chain consisting of the bees, the SA and 
the error estimation algorithm, to determine the global minimum of the 
Rosenbrock function. The bees algorithm requires only a specification 
of the range for each free parameter. Here, the parameters xi and X2 
vary between -5 and +5. 



Fig. 13 are given in Table 2. The upper right panel of Fig. 13 
shows the probability for a minimum along the xi axis hold- 
ing parameter X2 fixed at X2 = 0.9905. The lower right panel of 
Fig. 13 shows the probability for a minimum along the X2 axis 
holding parameter xi fixed at xi = 0.9954. These two panels 
clearly show that there is only one minimum within the given 
parameter range. 



5. Examples 

The first astrophysical application of MAGIX was fitting 
Herschel/HIFI spectra of an ortho-H20^ line in Band 4b with 



HgO"" in SgrB2(M) Rest Frequency (MHz) „ 

1.116 10^ 1.1155 lO" 1.115 10^ 
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Fig. 14. Fit of H2O+ using myXCLASS and MAGIX. In blue one 
specific component depicts the intrinsic hyperfine structure of 
the lines, while the green bars locate the positions of the velocity 
components. 
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myXCLASS^^, in particular absorption lines toward SgrB2(M) 
from the HEXOS GT KP (Schilke et al 2010). Here, the Hne 
shape is determined by the hyperfine structure and the veloc- 
ity structure, resulting in a complicated pattern. The fit was per- 
formed using the LM algorithm. As can be seen in Fig. 14, the 
match is convincing, but questions about the uniqueness of the 
fit and confidence intervals for the free parameters, about 100 in 
this case do arise. 

Additionally, we fit HIFI bands 4b and 5a toward SgrB2(M) 
(Schilke et al. 2010) with myXCLASS, simultaneously using an 
algorithm chain starting with the genetic algorithm and 78 free 
parameters (we used 26 velocity components where each com- 
ponent has three free parameters.). Here, we used the parameter 
values of the best fit-result of the genetic algorithm as starting 
values for the SA. At the end of the algorithm chain, we applied 
the error estimation algorithm to determine the left and right er- 
ror for each optimized parameter. The final result is shown in 
Fig. 15, where the dashed red (blue) lines indicate a model where 
we reduced (increase) the free parameter values of the best fit by 
the left (right) error of each free parameter. Clearly, we achieve 
an excellent description of the absorption lines and quantify the 
uncertainty of the model. In contrast to the previous fit shown in 
Fig. 14, where only one data set was fitted, we defined no start- 
ing values but only ranges for each free parameter and fitted two 
bands simultaneously. 



6. Conclusions 

The MAGIX package presented in this paper is a very helpful 
tool for modeling physical and chemical data using an arbitrary 
external model program. It represents a highly flexible toolbox 
where in principle any theoretical external model program can 
be plugged in. Thus, MAGIX can be used to optimize the de- 
scription of even non-astrophysical data. Furthermore, MAGIX 
is able to explore the landscape of the function without the 
knowledge of starting values and can calculate probabilities for 
the occurrence of minima. Therefore, MAGIX can find multiple 
minima and give information about confidence intervals for the 
parameters. Additionally, the MAGIX package contains the pos- 
sibility to combine algorithms in a so-called algorithm chain and 
make use of the advantages of the diff'erent algorithms included 
in the package. If the external model program fulfills certain re- 
quirements, MAGIX can run in a parallel mode to speed up the 
computation. 
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The myXCLASS program (https: //www. astro. uni-koeln. 
de/projects/schilke/XCLASS) accesses the CDMS ( 
(2005, Muller^^a/. 2001) http://www.cdms.de) and JPL ( 
(Pickett a/. 1998) http://spec.jpl.nasa.gov) molecular 
data bases. The calculated spectrum is a solution of the radiative 
transfer equation in one dimension (detection equation) 



(13) 



where the first sum goes over all molecules m and the second sum runs 
over the corresponding components c of molecule m. T(y)'"'^ indicates 
the optical depth of the component m, c and 77 (^m,c) represents the cor- 
responding beam-filling factor. 
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Fig. 15. Fit of ortho H20^ ground state lines in HIFI bands a) 4b 
and b) 5a toward SgrB2(M) with myXCLASS using an algo- 
rithm chain consisting of the genetic, the SA, and the error esti- 
mation algorithm. Owing to the high number of free parameters 
(78) it is not possible to show the distribution of the parameter 
values as in Figs. 2 - Fig. 13. 
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